Topic models have been successfully applied in lexicon extraction. However,most previous methods are limited to document-aligned data. In this paper, wetry to address two challenges of applying topic models to lexicon extraction innon-parallel data: 1) hard to model the word relationship and 2) noisy seeddictionary. To solve these two challenges, we propose two new bilingual topicmodels to better capture the semantic information of each word whilediscriminating the multiple translations in a noisy seed dictionary. We extendthe scope of topic models by inverting the roles of "word" and "document". Inaddition, to solve the problem of noise in seed dictionary, we incorporate theprobability of translation selection in our models. Moreover, we also proposean effective measure to evaluate the similarity of words in different languagesand select the optimal translation pairs. Experimental results using real worlddata demonstrate the utility and efficacy of the proposed models.
展开▼